智能论文笔记

MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization

Lei Qi , Hongpeng Yang , Yinghuan Shi , Xin Geng

分类：计算机视觉

2022-08-11

域的概括（DG）旨在学习一个对源域的模型，以很好地概括看不见的目标域。尽管它取得了巨大的成功，但大多数现有方法都需要用于源域中所有培训样本的标签信息，这在现实世界中既耗时又昂贵。在本文中，我们求助于解决半监督域的概括（SSDG）任务，其中每个源域中都有一些标签信息。为了解决任务，我们首先分析多域学习的理论，该理论强调了1）减轻域间隙的影响和2）利用所有样品训练模型可以有效地减少每个源域中的概括误差，因此提高伪标签的质量。根据分析，我们提出了Multimatch，即将FixMatch扩展到多任务学习框架，从而为SSDG生成高质量的伪标签。具体来说，我们将每个培训域视为一个任务（即本地任务），并将所有培训域（即全球任务）组合在一起，以训练看不见的测试域的额外任务。在多任务框架中，我们为每个任务使用独立的BN和分类器，这可以有效地减轻伪标记期间不同领域的干扰。同样，共享框架中的大多数参数，可以通过所有培训样本进行培训。此外，为了进一步提高伪标签的准确性和模型的概括，我们分别在培训和测试过程中分别融合了全球任务和本地任务的预测。一系列实验验证了所提出的方法的有效性，并且在几个基准DG数据集上优于现有的半监督方法和SSDG方法。

translated by 谷歌翻译

Label Distribution Learning for Generalizable Multi-source Person Re-identification

Lei Qi , Jiaying Shen , Jiaqi Liu , Yinghuan Shi , Xin Geng

分类：计算机视觉

2022-04-12

人重新识别（RE-ID）是视频监视系统中的一项关键技术，在监督环境中取得了重大成功。但是，由于可用源域和看不见的目标域之间的域间隙，很难将监督模型直接应用于任意看不见的域。在本文中，我们提出了一种新颖的标签分布学习（LDL）方法，以解决可推广的多源人员重新ID任务（即，有多个可用的源域，并且在培训期间看不到测试域），旨在旨在探索不同类别的关系，并减轻跨不同域的域转移，以改善模型的歧视并同时学习域不变特征。具体而言，在培训过程中，我们通过在线方式生产标签分布来挖掘不同类别的关系信息，因此它有益于提取判别特征。此外，对于每个类别的标签分布，我们进一步对其进行了修改，以更多和同等的关注该类不属于的其他域，这可以有效地减少跨不同域的域间隙并获得域不变特征。此外，我们还提供了理论分析，以证明所提出的方法可以有效地处理域转移问题。在多个基准数据集上进行的广泛实验验证了所提出的方法的有效性，并表明所提出的方法可以胜过最先进的方法。此外，进一步的分析还揭示了所提出的方法的优越性。

translated by 谷歌翻译

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Qi Lv , Ziqiang Cao , Lei Geng , Chunhui Ai , Xu Yan , Guohong Fu

分类：自然语言处理

2022-03-21

The lack of label data is one of the significant bottlenecks for Chinese Spelling Check (CSC). Existing researches use the method of automatic generation by exploiting unlabeled data to expand the supervised corpus. However, there is a big gap between the real input scenario and automatic generated corpus. Thus, we develop a competitive general speller ECSpell which adopts the Error Consistent masking strategy to create data for pretraining. This error consistency masking strategy is used to specify the error types of automatically generated sentences which is consistent with real scene. The experimental result indicates our model outperforms previous state-of-the-art models on the general benchmark. Moreover, spellers often work within a particular domain in real life. Due to lots of uncommon domain terms, experiments on our built domain specific datasets show that general models perform terribly. Inspired by the common practice of input methods, we propose to add an alterable user dictionary to handle the zero-shot domain adaption problem. Specifically, we attach a User Dictionary guided inference module (UD) to a general token classification based speller. Our experiments demonstrate that ECSpell$^{UD}$, namely ECSpell combined with UD, surpasses all the other baselines largely, even approaching the performance on the general benchmark.

translated by 谷歌翻译

A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification

Lei Qi , Lei Wang , Yinghuan Shi , Xin Geng

分类：计算机视觉

2022-01-24

人重新识别（RE-ID）在监督场景中取得了巨大成功。但是，由于模型过于适合所见源域，因此很难将监督模型直接传输到任意看不见的域。在本文中，我们旨在从数据增强的角度来解决可推广的多源人员重新ID任务（即，在培训期间看不见测试域，并且在培训期间看不见测试域，因此我们提出了一种新颖的方法，称为Mixnorm，由域感知的混合范围（DMN）和域软件中心正则化（DCR）组成。不同于常规数据增强，提出的域吸引的混合范围化，以增强从神经网络的标准化视图中训练期间特征的多样性，这可以有效地减轻模型过度适应源域，从而提高概括性。在看不见的域中模型的能力。为了更好地学习域不变的模型，我们进一步开发了域吸引的中心正规化，以更好地将产生的各种功能映射到同一空间中。在多个基准数据集上进行的广泛实验验证了所提出的方法的有效性，并表明所提出的方法可以胜过最先进的方法。此外，进一步的分析还揭示了所提出的方法的优越性。

translated by 谷歌翻译

Background-aware Classification Activation Map for Weakly Supervised Object Localization

Lei Zhu , Qi She , Qian Chen , Xiangxi Meng , Mufeng Geng , Lujia Jin , Zhe Jiang , Bin Qiu , Yunfei You , Yibao Zhang

分类：计算机视觉

2021-12-29

通过使用图像级分类掩模监督其学习过程，弱监督对象本地化（WSOL）放宽对对象本地化的密度注释的要求。然而，当前的WSOL方法遭受背景位置的过度激活，并且需要后处理以获得定位掩模。本文将这些问题归因于背景提示的不明显，并提出了背景感知分类激活映射（B-CAM），以便仅使用图像级标签同时学习对象和背景的本地化分数。在我们的B-CAM中，两个图像级功能，由潜在背景和对象位置的像素级别功能聚合，用于从对象相关的背景中净化对象功能，并表示纯背景样本的功能，分别。然后基于这两个特征，学习对象分类器和背景分类器，以确定二进制对象本地化掩码。我们的B-CAM可以基于提出的错开分类损失以端到端的方式培训，这不仅可以改善对象本地化，而且还抑制了背景激活。实验表明，我们的B-CAM在Cub-200，OpenImages和VOC2012数据集上优于一级WSOL方法。

translated by 谷歌翻译

Unsupervised Domain Generalization for Person Re-identification: A Domain-specific Adaptive Framework

Lei Qi , Lei Wang , Yinghuan Shi , Xin Geng

分类：计算机视觉

2021-11-30

域概括（DG）最近引起了人的重新识别（REID）的巨大关注。它旨在使在多个源域上培训的模型概括到未经看不见的目标域。虽然实现了有前进的进步，但现有方法通常需要要标记的源域，这可能是实际REID任务的重大负担。在本文中，我们通过假设任何源域都有任何标签可以调查Reid的无监督域泛化。为了解决这个具有挑战性的设置，我们提出了一种简单高效的域特定的自适应框架，并通过设计在批处理和实例归一化技术上的自适应归一化模块实现。在此过程中，我们成功地产生了可靠的伪标签来实现培训，并根据需要增强模型的域泛化能力。此外，我们表明，我们的框架甚至可以应用于在监督域泛化和无监督域适应的环境下改进人员Reid，展示了关于相关方法的竞争性能。对基准数据集进行了广泛的实验研究以验证所提出的框架。我们的工作的重要性在于它表明了对人Reid的无监督域概括的潜力，并为这一主题进一步研究了一个强大的基线。

translated by 谷歌翻译

Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification

Mengdi Gao , Ximeng Feng , Mufeng Geng , Zhe Jiang , Lei Zhu , Xiangxi Meng , Chuanqing Zhou , Qiushi Ren , Yanye Lu

分类：计算机视觉 | 人工智能

2021-06-23

目的：深度神经网络（DNN）已被广泛应用于医学图像分类中，从其在医学图像中的强大映射能力中受益。但是，这些现有的基于深度学习的方法取决于大量精心标记的图像。同时，标记过程中不可避免地引入噪声，从而降低了模型的性能。因此，制定强大的培训策略以减轻医学图像分类任务中的标签噪声是很重要的。方法：在这项工作中，我们提出了一种新颖的贝叶斯统计数据指导标签翻新机制（BLRM），以防止过度适合嘈杂的图像。 BLRM利用贝叶斯统计数据和指定时间加权技术中的最大后验概率（MAP）来选择性地纠正嘈杂图像的标签。激活BLRM时，训练时期逐渐纯化训练图像，从而进一步改善分类性能。结果：关于合成噪声图像（公共OCT和Messidor数据集）和现实世界嘈杂图像（Animal-10N）的全面实验表明，BLRM选择性地翻新了噪声标签，从而凝结了噪声数据的不良影响。同样，与DNN集成的抗噪声BLRM在不同的噪声比下有效，并且独立于骨干DNN架构。此外，BLRM优于抗噪声的最新比较方法。结论：这些研究表明，所提出的BLRM能够缓解医学图像分类任务中的标签噪声。

translated by 谷歌翻译

Learngene: From Open-World to Your Learning Task

Qiufeng Wang , Xin Geng , Shuxia Lin , Shiyu Xia , Lei Qi , Ning Xu

分类：机器学习 | 人工智能

2021-06-12

虽然深入学习在固定的大型数据集中取得了重大进展，但它通常遇到关于在开放世界场景，过度参数化和过度拟合小型样本中检测到未知/看不见的课程的挑战。由于生物系统可以很好地克服上述困难，因此个体从集体生物中继承了一个先天基因，这些生物已经进化了数十亿多年，然后通过少数例子学习新技能。灵感来自这一点，我们提出了一个实用的集体个人范式，其中进化（可扩展）网络在顺序任务上培训，然后识别现实世界中的未知课程。此外，提出了学习者，即用于学习目标模型的初始化规则的基因，从集体模型继承了元知识，并在目标任务中重建轻量级各个模型。特别地，根据梯度信息，提出了一种新的标准来发现集体模型中的学习者。最后，只有在目标学习任务上的少量样本才接受培训。我们在广泛的实证研究和理论分析中展示了我们方法的有效性。

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译

Tsetlin Machine Embedding: Representing Words Using Logical Expressions

Bimal Bhattarai , Ole-Christoffer Granmo , Lei Jiao , Rohan Yadav , Jivitesh Sharma

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-02

Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.

translated by 谷歌翻译